BONN: Bayesian Optimized Binary Neural Network
67
TABLE 3.3
Performance contributions of the components in RBCNs
on CIFAR100, where Bi=Bi-Real Net, R=RBConv,
G=GAN, and B=update strategy.
Kernel Stage
Bi
R
R+G
R+G+B
RBCN
32-32-64-128
54.92
56.54
59.13
61.64
RBCN
32-64-128-256
63.11
63.49
64.93
65.38
RBCN
64-64-128-256
63.81
64.13
65.02
66.27
Note: The numbers in bold represent the best results.
3) We further improve RBCNs by updating the BN layers with W and C fixed after
each epoch (line 17 in Algorithm 13). This further increases our accuracy by 2.51% (61.64%
vs. 59.13%) in CIFAR100 with 32-32-64-128.
3.7
BONN: Bayesian Optimized Binary Neural Network
First, we briefly introduce Bayesian learning. Bayesian learning is a paradigm for construct-
ing statistical models based on the Bayes Theorem, providing practical learning algorithms
and helping us understand other learning algorithms. Bayesian learning shows its signifi-
1
2
3
4
FIGURE 3.19
The evolution of the prior p(x), the distribution of the observation y, and the posterior
p(x|y) during learning, where x is the latent variable representing the full-precision param-
eters and y is the quantization error. Initially, the parameters x are initialized according
to a single-mode Gaussian distribution. When our learning algorithm converges, the ideal
case is that (i) p(y) becomes a Gaussian distribution N(0, ν), which corresponds to the
minimum reconstruction error, and (ii) p(x|y) = p(x) is a Gaussian mixture distribution
with two modes where the binarized values ˆx and −ˆx are located.